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Introduction I Analytics Lifecycle! Basic Methods 



Module 4: Advanced Analytics - Theory and 
Methods 

Part 5: Naive Bayesian Classifiers 


During this Part the following topics are covered: 

• Naive Bayesian Classifier 

• Theoretical foundations of the classifier 

• Use cases 

• Evaluating the effectiveness of the classifier 

• The Reasons to Choose (+) and Cautions (-) with the use of 
the classifier 
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Classifiers 


Where in the catalog should I place this product listing? 

Is this email spam? 


Classification: assign labels to objects based on the object's attributes. 
Usually supervised: training set of pre-classified examples. 

Our examples: 

► Naive Bayesian 

► Decision Trees 

► (and Logistic Regression) 
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Naive Bayesian Classifier 

The Naive Bayesian Classifier is a probabilistic classifier based on Bayes 1 Law and 
naive conditional independence assumptions . 

Determine the most probable class label for each object 

► Based on the observed object attributes 

►► Naively these attributes assumed to be conditionally independent of each other 

► Example: 

►► Based on the objects attributes {shape, color, weight} 

►► A given object that is {spherical, yellow, < 60 grams}, 
may be classified (labeled) as a tennis ball 

►► Even if these features depend on each other or upon the existence of the other 
features, a Naive Bayesian Classifier considers all of these features 
independently contribute to the probability that the object is a tennis ball. 

► Class label probabilities are determined using Bayes' Law 

Input variables are discrete but there are variations to the algorithms that work with 

continuous variables as well 

Output: 

► Probability score - proportional to the true probability 

► Class label - based on the highest probability score 
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Naive Bayesian Classifier - Use Cases 


• Preferred method for many text classification problems. 

► Try this first; if it doesn't work, try something more complicated 

• Use cases 

► Spam filtering, other text classification tasks 

► Fraud detection For example in auto insurance, based on a 

training data set with attributes (such as driver's rating, vehicle age, 
vehicle price, is it a claim by the policy holder, police rpnojfc^itus, 
claim genuine ) we can classify a new claim as genuine 
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Building a Training Dataset to Predict Good or Bad Credit 


• Predict the credit behavior of 
a credit card applicant from 
applicant's attributes: 

► Personal status 

► Job type 

► Housing type 

► Savings amount 

• These are all categorical 
variables and are better 
suited to Naive Bayesian 
Classifier than to logistic 
regression. 

• If there are multiple levels for 
the outcome you want to 
predict, then Naive Bayesian 
Classifier is a better solution. 


personal status 

Job 

housing 

savings status 

credit class 

mala single 

skilled 

own 

no known savings 

good 

female div/dep/mar 

skilled 

own 

<iea 

bad 

male single 

unskilled resident 

own 

<100 

good 

male single 

skilled 

for free 

<100 

good 

male single 

skilled 

for free 

<100 

bad 

male single 

unskilled resident 

for free 

no known savings 

good 

male single 

skilled 

own 

500<=X<1000 

good 

male single 

high qualif/sclf emp/mgm 

rent 

<100 

good 

male div/sep 

unskilled resident 

own 

>-1000 

good 

male mar/wid 

high qualif/self emp/mgm 

own 

<100 

bad 

female div/dep/mar 

skilled 

rent 

<100 

bad 

female div/dep/mar 

skilled 

rent 

<100 

bad 

female div/dep/mar 

skilled 

own 

<100 

good 

male single 

unskilled resident 

own 

<100 

bad 

female div/dep/mar 

skilled 

rent 

<100 

good 

female div/dep/mar 

unskilled resident 

own 

100<=X<500 

bad 

male single 

skilled 

own 

no known savings 

good 

male single 

skilled 

own 

no known savings 

good 

female div/dep/mar 

high qualif/self emp/mgm 

for free 

<100 

bad 

male single 

skilled 

own 

500<=X<1000 

good 

male single 

skilled 

own 

<100 

good 

male single 

skilled 

rent 

500<=X<1000 

good 

male single 

unskilled resident 

rent 

<100 

good 

male single 

skilled 

own 

100<=X<500 

good 

male mar/wid 

skilled 

own 

no known savings 

good 

male single 

unskilled resident 

own 

<100 

good 

male mar/wid 

unskilled resident 

own 

<100 

good 
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Technical Description - Bayes' Law 


P(C I A) = 


P(A n C) 
P(A) 


P(AI C)P(C) 
P(A) 


• C is the class label: 

/ C 6 {C 1; Q2/ ••• C n } 

• A is the observed object attributes 

A — (a^ a 2 > ... a m ) 

• P(C | A) is the probability of C given A is observed 

► Called the conditional probability 



Reverend Thomas Bayes 
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Technical Description - Bayes' Law 


• An example using Bayes Law: 

John flies frequently and likes to upgrade his seat to first class. He 
has determined that, if he checks in for his flight at least two hours 
early, the probability that he will get the upgrade is .75; otherwise, 
the probability that he will get the upgrade is .35. With his busy 
schedule, he checks in at least two hours before his flight only 40% of 
the time. Suppose John didn't receive an upgrade on his most recent 
attempt. What is the probability that he arrived late? 

• C = John arrives late P(C) = Probability John arrives late = .6 

• A = John did not receive an upgrade 

• P(A) = Probability John did not receive an upgrade = 

1 - ( .4 x .75 + .6 x .35) = 1 -.51 = .49 
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Technical Description - Bayes' Law 


• An example using Bayes Law: 

• P(A | C) = Probability that John did not receive an upgrade given 
that he arrived late = 1 - .35 = .65 

• P(C | A) = Probability that John arrived late given that he did not 
receive his upgrade = P(A|C)P(C)/P(A) = (.65 x .6)/.49 = .80 


In this simple example, C can take one of two possible values 
{arriving early, arriving late) and there is only one attribute which 
can take one of two possible values {received upgrade, did not 
receive upgrade}. 
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Apply the Naive Assumption and Remove a Constant 


• For observed attributes A = (a 1; a 2 , ... a m ), we want to compute 


P(C I A) = I Cjj-fXC,-) 

P(aj,a 2 ,...,a m ) 


/ = 1, 2,..., n 


and assign the classifier, Q, with the largest P(Cj|A) 


• Two simplifications to the calculations 

► Apply naive assumption - each is conditionally independent of 
each other, then 


m 


P{a x ,a 2 ,...,a m \C i ) = P{a x I C ; )P(a 2 1 C ; ) • • • P(a m I C ; ) = ]~~[ P(a j I C ; ) 


j=i 


► Denominator P(a 1 ,a 2 ,...a m ) is a constant and can be ignored 
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Building a Naive Bayesian Classifier 


• Applying the two simplifications 



• To build a Naive Bayesian Classifier, collect the following 
statistics from the training data: 

► P(C|) for all the class labels. 

► P(aj | Q) for all possible a- s and C, 

► Assign the classifier label, C,, that maximizes the value of 



i = 1, 2,..., n 


Vi=' 


J 
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Naive Bayesian Classifiers for the Credit Example 

• Class labels: {good, bad} 

► P(good) = 0.7 

► P(bad) = 0.3 

• Conditional Probabilities 

► P(own|bad) =0.62 

► P(own|good) = 0.75 

► P(rent | bad) =0.23 

► P(rent|good) =0.14 

► ... and so on 
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Naive Bayesian Classifier for a Particular Applicant 


Given an example of an 
applicant whose attributes are 

A= {female single, 
owns home, 
self-employed, 
savings > $1000} 

Since P(good | A) > (bad | A), 
assign the applicant the label 
"good" credit 


a i 


POjI 

Ci) 

female single 

good 

0.28 

female single 

bad 

0.36 

own 

good 

0.75 

own 

bad 

0.62 

self emp 

good 

0.14 

self emp 

bad 

0.17 

savings>1K 

good 

0.06 

savings>1K 

bad 

0.02 


P(good|A) ~ (0.28*0.75*0.1 4*0.06)*0.7 = 0.0012 
P(bad|A) ~ (0.36*0.62*0.1 7*0.02)*0.3 = 0.0002 
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Naive Bayesian Implementation Considerations 


Numerical underflow 

► Resulting from multiplying several probabilities near zero 

► Preventable by computing the logarithm of the products 
Zero probabilities due to unobserved attribute/classifier pairs 

► Resulting from rare events 

► Handled by smoothing (adjusting each probability by a small amount) 
Assign the classifier label, Cj, that maximizes the value of 

f m A 


£ log no, ic ; ) 


+ logP(C,) 


U=i 


J 


where i = 1,2,. ..,n and 

P' denotes the adjusted probabilities 
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Diagnostics 

• Hold-out data 

► How well does the model classify new instances? 



• Cross-validation 

• ROC curve/AUC 
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Diagnostics: Confusion Matrix 





Prediction 




Actual 

nnnrl 

bad 


true positives (TP) 

.Class 

yuuu 


false negatives (FN) 



good 

671 

29 

700 



bad 

^38 

262 

300 


false positives (FP 

> 


709 

291 

1000 

true negatives (TN) 


Overall success rate (or accuracy): 

(TP + TN) / (TP+TN+FP+FN) = (671+262)/1000 * 0.93 
Recall (or TPR): TP / (TP + FN) = 671 / (671+29) = 671/700 * 0.96 
what percent of positive instances did we correctly identify. 

FPR: FP / (FP + TN) = 38 / (38 + 262) = 38/300 * 0.1 3 

what percent of negatives we marked positive 

FNR: FN / (TP + FN) = 29 / (671 + 29) = 29/700 * 0.04 

what percent of positives we marked negative 

Precision: TP/ (TP + FP) = 671/709 * 0.95 

what percent of things we marked positive really are positive 
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Naive Bayesian Classifier - Reasons to Choose 
and Cautions (-) — 1 — 


Reasons to Choose (+) 

Cautions (-) 

Handles missing values quite well 

Numeric variables have to be discrete 

(categorized) Intervals 

Robust to irrelevant variables 

Sensitive to correlated variables 

"Double-counting" 

Easy to implement 

Not good for estimating probabilities 

Stick to class label or yes/no 
used for class label assignments only 

Easy to score (predict) data 


Resistant to over-fitting 


Computationally efficient 

Handles very high dimensional 
problems 

Handles categorical variables with a 

lot of levels 
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Check Your Knowledge 


1. Consider the following Training Data Set: 

• Apply the Naive Bayesian Classifier to this 
data set and compute the probability 
score for P(y = 1 1 X) for X = (1,0,0) 


Show your work 



Your Thoughts? 


Training Data Set 


XI 

X2 

X3 

Y 

1 

1 

1 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

1 


2. List some prominent use cases of the Naive Bayesian Classifier. 

3. What gives the Naive Bayesian Classifier the advantage of being 
computationally inexpensive? 


4. Why should we use log-likelihoods rather than pure probability 
values in the Naive Bayesian Classifier? 
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Check Your Knowledge (Continued) 

5. What is a confusion matrix and how it is used to evaluate 

Your Thoughts? 

effectiveness of the model? 

6. Consider the following data set with two input features 
temperature and season 

• What is the Naive Bayesian assumption? 

• Is the Naive Bayesian assumption satisfied for this problem? 


Temperature 

Season 

Electricty Usage 

-10 to 50 F 

Winter 

High 

50 to 70 F 

Winter 

Low 

70 to 85 F 

Summer 

Low 

85 to 110 F 

Summer 

High 
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Introduction I Analytics Lifecycle! Basic Methods 



Module 4: Advanced Analytics - Theory and 
Methods 

Part 5: Naive Bayesian Classifiers - Summary 

During this Part the following topics were covered: 

• Naive Bayesian Classifier 

• Theoretical foundations of the classifier 

• Use cases 

• Evaluating the effectiveness of the classifier 

• The Reasons to Choose (+) and Cautions (-) with the use of 
the classifier 
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Lab Exercise 8: Naive Bayesian Classifier 

This Lab is designed to investigate and practice the 
Naive Bayesian Classifier analytic technique. 

After completing the tasks in this lab you should be able 
to: 

• Use R functions for Naive Bayesian Classification 

• Apply the requirements for generating 
appropriate training data 

• Validate the effectiveness of the Naive Bayesian 
Classifier with the big data 
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Lab Exercise 8: Naive Bayesian Classifier Parti - 
Workflow 


* 

l 

v 

* 

5 


Set working directory and review training and test data 


Install and load library “el 071” 


Read in and review data 


Build the Naive Bayesian classifier model from first 
principles 


Predict the results 


Use the naiveBayes function 


Predict the Outcome of “Enrolls” with the test data 


Use the Laplace smoothing 
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Lab Exercise 8: Naive Bayesian Classifier Part2 - 
Workflow 


Define the problem (Translating to an Analytics Question) 


Open the ODBC connection 


Build the training dataset and the test dataset from the database 


Extract the first 10000 records for the training data set and the remaining 10 for the 
test 


Execute the NB classifier 


Validate the effectiveness of the NB classifier with a confusion matrix 


Execute NB classifier with MADlib function calls within the database 
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